19 research outputs found
Online Algorithms for Dynamic Matching Markets in Power Distribution Systems
This paper proposes online algorithms for dynamic matching markets in power
distribution systems, which at any real-time operation instance decides about
matching -- or delaying the supply of -- flexible loads with available
renewable generation with the objective of maximizing the social welfare of the
exchange in the system. More specifically, two online matching algorithms are
proposed for the following generation-load scenarios: (i) when the mean of
renewable generation is greater than the mean of the flexible load, and (ii)
when the condition (i) is reversed. With the intuition that the performance of
such algorithms degrades with increasing randomness of the supply and demand,
two properties are proposed for assessing the performance of the algorithms.
First property is convergence to optimality (CO) as the underlying randomness
of renewable generation and customer loads goes to zero. The second property is
deviation from optimality, is measured as a function of the standard deviation
of the underlying randomness of renewable generation and customer loads. The
algorithm proposed for the first scenario is shown to satisfy CO and a
deviation from optimal that varies linearly with the variation in the standard
deviation. But the same algorithm is shown to not satisfy CO for the second
scenario. We then show that the algorithm proposed for the second scenario
satisfies CO and a deviation from optimal that varies linearly with the
variation in standard deviation plus an offset
Meta-Learning Guarantees for Online Receding Horizon Learning Control
In this paper we provide provable regret guarantees for an online
meta-learning receding horizon control algorithm in an iterative control
setting. We consider the setting where, in each iteration the system to be
controlled is a linear deterministic system that is different and unknown, the
cost for the controller in an iteration is a general additive cost function and
there are affine control input constraints. By analysing conditions under which
sub-linear regret is achievable, we prove that the online receding horizon
controller achieves a regret for the controller cost and constraint violation
that are with respect to the best policy that satisfies
the control input control constraints, when the preview of the cost functions
is limited to an interval and the interval size is doubled from one to the
next. We then show that the average of the regret for the controller cost and
constraint violation with respect to the same policy vary as
with the number of iterations , under the
same setting.Comment: arXiv admin note: substantial text overlap with arXiv:2008.13265,
arXiv:2010.0726
Mechanism Design for Demand Response Programs
Demand Response (DR) programs serve to reduce the consumption of electricity
at times when the supply is scarce and expensive. The utility informs the
aggregator of an anticipated DR event. The aggregator calls on a subset of its
pool of recruited agents to reduce their electricity use. Agents are paid for
reducing their energy consumption from contractually established baselines.
Baselines are counter-factual consumption estimates of the energy an agent
would have consumed if they were not participating in the DR program. Baselines
are used to determine payments to agents. This creates an incentive for agents
to inflate their baselines. We propose a novel self-reported baseline mechanism
(SRBM) where each agent reports its baseline and marginal utility. These
reports are strategic and need not be truthful. Based on the reported
information, the aggregator selects or calls on agents to meet the load
reduction target. Called agents are paid for observed reductions from their
self-reported baselines. Agents who are not called face penalties for
consumption shortfalls below their baselines. The mechanism is specified by the
probability with which agents are called, reward prices for called agents, and
penalty prices for agents who are not called. Under SRBM, we show that truthful
reporting of baseline consumption and marginal utility is a dominant strategy.
Thus, SRBM eliminates the incentive for agents to inflate baselines. SRBM is
assured to meet the load reduction target. SRBM is also nearly efficient since
it selects agents with the smallest marginal utilities, and each called agent
contributes maximally to the load reduction target. Finally, we show that SRBM
is almost optimal in the metric of average cost of DR provision faced by the
aggregator
Online Learning Robust Control of Nonlinear Dynamical Systems
In this work we address the problem of the online robust control of nonlinear
dynamical systems perturbed by disturbance. We study the problem of attenuation
of the total cost over a duration in response to the disturbances. We
consider the setting where the cost function (at a particular time) is a
general continuous function and adversarial, the disturbance is adversarial and
bounded at any point of time. Our goal is to design a controller that can learn
and adapt to achieve a certain level of attenuation. We analyse two cases (i)
when the system is known and (ii) when the system is unknown. We measure the
performance of the controller by the deviation of the controller's cost for a
sequence of cost functions with respect to an attenuation , . We
propose an online controller and present guarantees for the metric when
the maximum possible attenuation is given by , which is a
system constant. We show that when the controller has preview of the cost
functions and the disturbances for a short duration of time and the system is
known when , where . We then show that when the system is unknown
the proposed controller with a preview of the cost functions and the
disturbances for a short horizon achieves , when , where
is the accuracy of a given nonlinear estimator and is the duration
of the initial estimation period. We also characterize the lower bound on the
required prediction horizon for these guarantees to hold in terms of the system
constants
Online Learning for Incentive-Based Demand Response
In this paper, we consider the problem of learning online to manage Demand
Response (DR) resources. A typical DR mechanism requires the DR manager to
assign a baseline to the participating consumer, where the baseline is an
estimate of the counterfactual consumption of the consumer had it not been
called to provide the DR service. A challenge in estimating baseline is the
incentive the consumer has to inflate the baseline estimate. We consider the
problem of learning online to estimate the baseline and to optimize the
operating costs over a period of time under such incentives. We propose an
online learning scheme that employs least-squares for estimation with a
perturbation to the reward price (for the DR services or load curtailment) that
is designed to balance the exploration and exploitation trade-off that arises
with online learning. We show that, our proposed scheme is able to achieve a
very low regret of with respect to the
optimal operating cost over days of the DR program with full knowledge of
the baseline, and is individually rational for the consumers to participate.
Our scheme is significantly better than the averaging type approach, which only
fetches regret
Regret Guarantees for Online Receding Horizon Learning Control
We address the problem of controlling an unknown linear dynamical system with
general cost functions and affine constraints on the control input through
online learning. Our goal is to develop an algorithm that minimizes the regret,
which is defined as the difference between the cumulative cost incurred by the
algorithm and that of a receding horizon controller (RHC) with full knowledge
of the system and state and that satisfies the control input constraints. Such
performance metric is harder than minimizing the regret w.r.t. the best linear
feedback controller commonly adopted in the literature, because the linear
controllers might be sub-optimal or violate the constraints throughout. By
exploring the conditions under which sub-linear regret is guaranteed, we
propose an online receding horizon controller that learns the unknown system
parameter from the sequential observation along with the necessary perturbation
for exploration. We show that the proposed controller's performance is upper
bounded by for both regret and cumulative
constraint violation when the controller has preview of the cost functions for
the interval that doubles in size from one interval to the next. We also show
that improved upper bound of can be achieved for
both regret and cumulative constraint violation when the controller has full
preview of the cost functions.Comment: arXiv admin note: text overlap with arXiv:2010.1132